An analysis of gene/protein associations at PubMed scale
نویسندگان
چکیده
BACKGROUND Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow subdomains of biomedicine for which annotated resources for extraction method development are available. RESULTS In the present study, our aim is to estimate the coverage of all statements of gene/protein associations in PubMed that existing resources for event extraction can provide. We base our analysis on a recently released corpus automatically annotated for gene/protein entities and syntactic analyses covering the entire PubMed, and use named entity co-occurrence, shortest dependency paths and an unlexicalized classifier to identify likely statements of gene/protein associations. A set of high-frequency/high-likelihood association statements are then manually analyzed with reference to the GENIA ontology. CONCLUSIONS We present a first estimate of the overall coverage of gene/protein associations provided by existing resources for event extraction. Our results suggest that for event-type associations this coverage may be over 90%. We also identify several biologically significant associations of genes and proteins that are not addressed by these resources, suggesting directions for further extension of extraction coverage.
منابع مشابه
Bioinformatic and empirical analysis of a gene encoding serine/threonine protein kinase regulated in response to chemical and biological fertilizers in two maize (Zea mays L.) cultivars
Molecular structure of a gene, ZmSTPK1, encoding a serine/threonine protein kinase in maize was analyzed by bioinformatic tool and its expression pattern was studied under chemical biological fertilizers. Bioinformatic analysis cleared that ZmSTPK1 is located on chromosome 10, from position 141015332 to 141017582. The full genomic sequence of the gene is 2251 bp in length and includes 2 exons. ...
متن کاملSystematic identification of latent disease-gene associations from PubMed articles
Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated ...
متن کاملExpression of Recombinant Factor IX Using the Transient Gene Expression Technique
Background: Pilot and large-scale production of recombinant proteins requires the presence of stable clones capable of producing large quantities of recombinant proteins. Not only the process of selecting stable clones is time consuming, but also the continuous culturing of clones in large-scale production may cause loss of incoming plasmid and recombinant genes. Thus, considering the advanceme...
متن کاملEvaluation of Cell Penetrating Peptide Delivery System on HPV16E7 Expression in Three Types of Cell Line
Background: The poor permeability of the plasma and nuclear membranes to DNA plasmids are two major barriers for the development of these therapeutic molecules. Therefore, success in gene therapy approaches depends on the development of efficient and safe non-viral delivery systems. Objectives: The aim of this study was to investigate the in vitro delivery of plasmid DNA encoding HPV16 E7 gene...
متن کاملIn silico Analysis and Expression of Osmotin-EAAAK-LTP Fused Protein
Antifungal agents are causing different problems in the agriculture industry. Plants are using various defense mechanisms for resistance against fungal pathogens. Some examples of these mechanisms are making physical barriers, producing chemical components and pathogenesis-related proteins such as lipid transfer protein (LTP) and Osmotin which can inhibit the growth of fungi at micro-molar conc...
متن کامل